Multidimensionality in Statistical, OLAP, and Scientific Databases
نویسنده
چکیده
Multidimensionality in Statistical, OLAP, and Scientific Databases 47 Copyright © 2003, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc. is prohibited. INTRODUCTION AND BACKGROUND There is a lot of data that can be viewed as multidimensional data. The term multidimensional databases typically refers to a collection of objects, each represented as a point in a multidimensional space. Even data that is represented in a tabular form, such as relations, can be thought of as multidimensional data, if each row (tuple) is thought of as an object, and the columns (attributes) are thought of as the dimensions. For example, consider the following table: employee (personID, age, sex, salary) shown in Figure 1a. If each person is represented as a point in the multidimensional space of (age, sex, salary), then that table can be represented as in Figure 1b. The utility of representing data in the multidimensional space is that it is more natural to view certain features of the data in this way. For example, it is natural to view clusters in the multidimensional space. In Figure 1b, one can easily see that there is a small cluster of highly paid people (perhaps representing managers who are generally older) and a larger cluster of lower paid people. We can also see ìoutliersî as is the case with the younger person with a high salary. Of course, these concepts extends to data in more than three dimensions, but cannot be viewed as easily. The problem of viewing high-dimensional data to identify clusters, outliers, and various patterns has been the subject of several research projects. An extensive review of such methods is provided in Keim & Kriegel (1996) and will not be discussed further here. Some data is naturally multidimensional such as two-dimensional or threedimensional spatial data. For example, climate modelers prefer to view their observed or simulated data in a multidimensional structure representing space (two or three dimensions), time, and variables being measured (temperature, wind velocity, etc.) In this case, certain operations, such a selecting spatial regions or personID age sex salary
منابع مشابه
Data-Driven Multidimensional Design for OLAP
OLAP is a popular technology to query scientific and statistical databases, but their success heavily depends on a proper design of the underlying multidimensional (MD) databases (i.e., based on the fact / dimension paradigm). Relevantly, different approaches to automatically identify facts are nowadays available, but all MD design methods rely on discovering functional dependencies (FDs) to id...
متن کاملPARSIMONY: An Infrastructure for Parallel Multidimensional Analysis and Data Mining
Multidimensional analysis and online analytical processing (OLAP) operations require summary information on multidimensional data sets. Most common are aggregate operations along one or more dimensions of numerical data values. Simultaneous calculation of multidimensional aggregates are provided by the Data Cube operator, used to calculate and store summary information on a number of dimensions...
متن کاملExpressing OLAP Preferences
Multidimensional databases play a relevant role in statistical and scientific applications, as well as in business intelligence systems. Their users express complex OLAP queries, often returning huge volumes of facts, sometimes providing little or no information. Thus, expressing preferences could be highly valuable in this domain. The OLAP domain is representative of an unexplored class of pre...
متن کاملHigh Performance Data Mining Using Data Cubes on Parallel Computers
On-Line Analytical Processing techniques are used for data analysis and decision support systems. The multidimensionality of the underlying data is well represented by multidimensional databases. For data mining in knowledge discovery, OLAP calculations can be effectively used. For these, high performance parallel systems are required to provide interactive analysis. Precomputed aggregate calcu...
متن کاملSISYPHUS: A Chunk-Based Storage Manager for OLAP Cubes
In this paper, we present SISYPHUS, a storage manager for data cubes that provides an efficient physical base for performing OLAP operations. On-Line Analytical Processing (OLAP) poses new requirements to the physical storage layer of a database management system. Special characteristics of OLAP cubes such as multidimensionality, hierarchical structure of dimensions, data sparseness, etc., are ...
متن کامل